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A Direct Data-Cluster Analysis Method Based on Neutrosophic 
Set Implication 
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Abstract: Raw data are classified using clustering techniques in a reasonable manner to 
create disjoint clusters. A lot of clustering algorithms based on specific parameters have 
been proposed to access a high volume of datasets. This paper focuses on cluster analysis 
based on neutrosophic set implication, i.e., a k-means algorithm with a threshold-based 
clustering technique. This algorithm addresses the shortcomings of the k-means clustering 
algorithm by overcoming the limitations of the threshold-based clustering algorithm. To 
evaluate the validity of the proposed method, several validity measures and validity indices 
are applied to the Iris dataset (from the University of California, Irvine, Machine Learning 
Repository) along with k-means and threshold-based clustering algorithms. The proposed 
method results in more segregated datasets with compacted clusters, thus achieving higher 
validity indices. The method also eliminates the limitations of threshold-based clustering 
algorithm and validates measures and respective indices along with k-means and threshold- 
based clustering algorithms. 


Keywords: Data clustering, data mining, neutrosophic set, k-means, validity measures, 
cluster-based classification, hierarchical clustering. 


1 Introduction 


Today, data repositories have become the most favored systems. To name a few, we have 
relational databases, data mining, and temporal and transactional databases. However, due 
to the high volume of data in these repositories, the prediction level at the same time has 
become too complex and tough. Today’s scenarios also indicate the diversity of these data 
(for example, from scientific to medical, geographic to demographic, and financials to 
marketing). Therefore, the diversity of the data and the extensive volume of those data 
resulted in the emergence of the field of data mining in recent years [Hautamaki, 
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Cherednichenko and K&arkkdinen et al. (2005)]. Secondly, grouping data objects and 
converting them into unknown classes (called clustering) has become a strong tool and a 
favorite choice in recent years. In clustering, similar data objects are grouped together, and 
dissimilar data objects are put into other groups. These are called unsupervised 
classification. In unsupervised classification, analysis is done on dissimilar data objects or 
raw information, and then, the relationships among them are discovered without any 
external interference. Several clustering methods exist in the literature, and they are broadly 
classified into hierarchical-based clustering algorithms and partitioning-based clustering 
algorithms [Reddy and Vinzamuri (2019); Rodriguez, Comin, Casanova et al. (2019)]. 
Some other types of clustering (probabilistic clustering, fuzzy-based clustering, density- 
and grid-based clustering) are also found in the literature [Aggarwal (2019); Nerurkar, 
Shirke, Chandane et al. (2018); Sanchez-Rebollo, Puente, Palacios et al. (2019); Zhang, He, 
Jin et al. (2020)]. 


In this work, we discuss a method geared towards the threshold value concept in a cluster- 
analysis method based on neutrosophic set implication (NSI). Although the use of this 
method is still in its infancy, we feature the advantages of the proposed method over a k- 
means algorithm. Neutrosophic systems use confidence, dependency, and falsehood (c, d, 
f) to make uncertainty more certain; in other words, it decreases complexity. A neutrosophic 
system is a paraconsistency approach because (per the falsehood theory of neutrosophic 
sets) no event, task, or signal can be perfectly consistent until the job is done [Jha, Son, 
Kumar et al. (2019)]. We intend to enhance the neutrosophic set in a detailed paraconsistent 
plan to apply to clustering in various algorithms. Our contribution is to make this approach 
result-oriented via correlating neutrosophic sets, i.e., confidence and dependency, 
justifying falsehood. 


The rest of the paper is organized as follows. Section 2 presents related work and the 
advantages of NSI over a k-means algorithm. Section 3 discusses basic theory and definitions. 
Applications of two neutrosophic products (the neutrosophic triangle product and the 
neutrosophic square product) are described in Section 4. Section 5 discusses the direct 
neutrosophic cluster analysis method. The performance evaluation of the threshold and k- 
means—based methods are presented in Section 6. Finally, Section 7 concludes the paper. 


2 Related work 


Supervised and unsupervised learning are two fundamental categories of data analysis 
techniques. A supervised data analysis method includes training in the patterns for inferring 
a function from labeled training data; an unsupervised data analysis method includes 
unlabeled data. The unsupervised data analysis method uses an object function to optimize 
the maximum and minimum similarity among similar and dissimilar objects, respectively. 
The biggest challenge observed in previous work shows that data clustering is more 
complicated and challenging than data classification, because it falls under unsupervised 
learning. The main goal of data clustering is to group similar objects into one group. 

Recent works published in data clustering indicates that most of the researchers use k- 
means clustering, hierarchical clustering, and similar techniques. Specially, Hu et al. [Hu, 
Nurbol, Liu et al. (2010); Sanchez-Rebollo, Puente, Palacios et al. (2019)] have published 
work in which it can be clearly seen that clustering is difficult because it itself is an 
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unsupervised learning problem. Most of the times, we use a dataset and are asked to infer 
structure within it, in this case, the latent clusters or categories in the data. The problem is 
the classification problems. Though, deep artificial neural networks are very good at 
classification, but clustering is still a very open problem. For clustering, we lack this critical 
information. This is why data clustering is more complicated and challenging when 
unsupervised learning is considered. Authors believe that the best example to illustrate this 
is to predict whether or not a patient has a common disease based on a list of symptoms. 


Many researchers Boley et al. [Boley, Gini, Gross et al. (1999); Arthur and Vassilvitskii 
(2007); Cheung (2003); Fahim, Salem, Torkey et al. (2006); Khan and Ahmad (2017)] 
proposed partitioning-based methodologies, such as k-means, edge-based strategies and 
variants. The k-means strategy is perhaps the most widely used clustering algorithm, being 
an iterative process that divides a given dataset into & disjoint groups. Jain [Jain (2010)] 
presented a study that indicated the importance of the widely accepted k-means technique. 
Many researchers have proposed variations of partitioning algorithms to improve the 
efficiency of clustering algorithms [Celebi, Kingravi and Vela (2013); Erisoglu, Calis and 
Sakallioglu (2011); Reddy and Jana (2012)]. Finding the optimal solution from a k-means 
algorithm is NP-hard, even when the number of clusters is small [Aloise, Deshpande, 
Hansen et al. (2009)]. Therefore, a k-means algorithm finds the local minimum as 
approximate optimal solutions. 


Nayini et al. [Nayini, Geravand and Maroosi (2018)] overcame k-means weaknesses by 
using a threshold-based clustering method. This work also proposed a partitioning-based 
method to automatically generate clusters by accepting a constant threshold value as an 
input. Authors used similarity and threshold measures for clustering to help users to 
identify the number of clusters. They identified outlier data, and decreased the negative 
impact on clustering. The time complexity of this algorithm is O(nk), which is better than 
k-means [Mittal, Sharma and Singh (2014)]. In this algorithm, instead of providing initial 
centroids, only one centroid is taken, which is one of the data objects. Afterwards, the 
formation of a new cluster depends upon the distance between the existing centroid and the 
next randomly selected data objects. 


Even in the same dataset, clustering algorithms’ results can differ from one another, 
particularly the results from the k-means and edge-based system techniques. Halkidi et al. 
[Halkidi, Batistakis and Vazirgiannis (2000)] proposed quality scheme assessment and 
clustering validation techniques [ Halkidi, Batistakis and Vazirgiannis (2001)]. Clustering 
algorithms produce different partitions for different values of the input parameters. The 
scheme selects best clustering schemes to find the best number of clusters for a specific 
dataset based on the defined quality index. The quality index validates and assures good 
candidate estimation based on separation and compactness, two components contained in 
a quality index. 

An index called the Davies—Bouldin index (DBI) was proposed [Davies and Bouldin 
(1979)] for cluster validation. This validity index is, in fact, a ratio of separation to 
compactness. In this internal evaluation scheme, the validation is done by evaluating 
quantities and features inherent in the dataset. 


Yeoh et al. [Yeoh, Caraffini and Homapour (2019)] proposed a unique optimized stream 
(OpStream) clustering algorithm using three variants of OpStream. These variants were 
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taken from different optimization algorithms, and the best variant was chosen to analyze 
robustness and resiliency. Ulugay et al. [Ulugay and Sahin (2019)] proposed an algebraic 
structure of neutrosophic multisets that allows membership sequences. These sequences 
have a set of real values between 0 and 1. Their proposed neutrosophic multigroup works 
with the neutrosophic multiset theory, set theory, and group theory. Various methods and 
applications of a k means algorithm for clustering have been worked out recently. Wang et 
al. [Wang, Gittens and Mahoney (2019)] identifies and extracts a varied collection of 
cluster structures than the linear k-means clustering algorithm. However, kernel k-means 
clustering is computationally expensive when the non-linear feature map is high- 
dimensional and there are many input points. On the other hand, Jha et al. [ Jha, Kumar, 
Son et al. (2019)] uses a different clustering technique to resolve stock market prediction. 
They have used a rigorous machine learning approaches in hand to hand with clustering of 
the high volume of data. 

This paper studied the applications of hierarchical (ward, single, average, centroid and 
complete linkages) and k-means clustering techniques in air pollution studies of almost 40 
years data. 


3 Neutrosophic basics and definitions 


In this section, we proceed with fundamental definitions of neutrosophic theory that include 
truth (T), indeterminacy (I) and falsehood (F). The degree of T, I, and F are evaluated with 
their respective membership functions. The respective derivations are explained below. 


3.1 Definitions in the neutrosophic set 


Let § be a space for objects with generic elements, s € S . A neutrosophic set (NS), N in 
5, is characterized by a truth membership function, G,, , an indeterminacy membership 


function, 7, , and a falsehood membership function, 7. Here Qy (s) gO: (s) , and 


O, (s ) are real standard or non-standard subsets of [0,17] such that Oy, In, Fy : S > 


0,17]. Tab. 1 shows the acronyms and nomenclatures used in the definitions. 
y 


Table 1: Nomenclatures and acronyms 


Nomenclature/ Definition Nomenclature/ Definition 

acronyms acronyms 

S Space of objects OpStream Optimized Stream 
clustering 

N Neutrosophic set T Truth 

On Truth membership function I Indeterminacy 

Te Indeterminacy membership F Falsehood 

function 
Fy, Falsehood membership NS Neutrosophic sets 


function 
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Nomenclature/ Definition Nomenclature/ Definition 
acronyms acronyms 
O ( s) Singleton subinterval or subsets Square product 
N 
of S 
O ( s) Singleton subinterval or subsets <a Triangular product 
N 
of S 
F ( s) Singleton subinterval or subsets Lukasiewicz 
N aa cco 
of S' implication operator 
NSI Neutrosophic set implication CIN Intuitionistic 
neutrosophic 
implication 
CNR Complex neutrosophic CNS Complex neutrosophic 
Relations sets 


A singleton set, which is also called as a unit set, contains exactly one element. For example, 
the set {null} is a singleton containing the element null. The term is also used for a 1-tuple, 
a sequence with one member. A singleton interval is an interval of one such elements. 


Assume that functions Oy (s ) , Ov (s ) , and Fy, (s) are singleton subintervals or 
subsets of the real standard, such that with 
Ov (s):S >[0,1],1)(s):8 > [0,1], (s):S >e[0,1] . Then, a simplification of 
neutrosophic set N is denoted by 


N={(s,0.(s),ly(8),Fy(s)):9€ S| 


with 0O<Q,, (s) +1y (s) +E (s) <3. It is a simplified neutrosophic set, i.e., a subclass 


of the neutrosophic set. This subclass of the neutrosophic set covers the notions of the 
interval neutrosophic set and the single-valued neutrosophic set [Haibin, Florentin, 
Yanqing et al. (2010); Ye (2014)]. 


3.2 Operations in the neutrosophic set 
Assume that S, and = $2 are two neutrosophic sets, where 


N, = (s50,(s)31,(s);F(s))|s eS} and N, = {(s;0,(s)34,(s);(s))|s eS} . 
Then 


a. N, <N, if and only if Q, (s) <Q, (s)s1, (s) ZL, (s);F, (s) Ea (s), 
b. Ny = {(s3F, (s);1,(s)3Q, (s)) |seS}, 
c. NAN, =4(xsmin{7,(x);O, (x)}smax{/, (s); 1, (s)};max{F(s);F,(s)})|s eS}, 


d. N,N) = {(s;max{Q,(s);O, (s)}smin{Z, (s);1,(s)};smin{F,(s); F,(s)})|s € 8} 
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3.3 Definition of states of a set 
Former and latter 


Let us assume that V(i = 1,2) are two ordinary subsets with an ordinary relation: 


Rc Vx, Then, for any q, f €V,,Rf ={q| Rf} is called a former set, and gR = {q | gRf} 


is called the latter set. 


3.4 Definition of neutrosophic algebraic products 
Triangle product and square product 


Let us assume that /, (j =1,2,3) are ordinary subsets Rc V,xV, and R, c V,xV;, such that 
triangle product R, <.R, CV, xV, of /, and, can be defined as follows: 


eV, Vg GeV, CV, g, for any (e,g)eV, xV, (1) 


Correspondingly, R oR 2 Square product, can be defined as follows: 
1 2 


eV, aV,g eV, =V 4g, for any (e.g )eV, XV, (2) 


where eV, CV,g if and only if eV, CV,g and eV, DV ,g. 


3.5 Definition of neutrosophic implication operators 

If ~ is a binary operation on [0,1] , and if a(0, 0, 0)=a(0, 0,1)=a(0, 1, 1) 
=a(1, 1, 0)=a/(1, 0, 1)=a(I, 1, 1)=1 and a(I, 0, 0)=0 

In this case, 1 is called a neutrosophic implication operator. 

For any a,b,c €[0, l], a(a, b, c) is a neutrosophic implication operator. If we extend the 


Lukasiewicz implication operator to the neutrosophic implication operator, then 
O(a, b, c)=min(l-a +bh+c, 1, 1). 


3.6 Definition of generalized neutrosophic products 


Let us extend the Lukasiewicz implication operator to a neutrosophic valued environment. 
If we consider membership degrees Q,, and Q, of yw and v only, for any two 
neutrosophic valued environments, y=(Q,, 1,, F,) and v=(Q,,1,, F) , then 
min {1 -0,+0,, 1, 1} is unable to reflect the dominance of the neutrosophic environment, 
and therefore, we consider the indeterminacy and non-membership 7 pels, and F.,,F, as 
well. Now, we define neutrosophic Lukasiewicz implication operator ®(, v) based on the 


neutrosophic valued environmental components and the Lukasiewicz implication operator. 
The membership degree, the degree of indeterminacy, and the non-membership degree of 
®(u, v) are expressed as follows: 


A Direct Data-Cluster Analysis Method Based on Neutrosophic 1209 


min{1, min{1-Q,+Q,, 1-1, +1,, 1-F,+F,t 
=min{1, 1-0,+Q,, 1-I,+1,, 1-F,+F,} 
and 


max {0, min {I-(I-O,+9,),1-(1-/, +1,),1-(I-F, +F,)}} 


(3) 
= max {0, min {Q, cs Ore Mie ire a -F,\} 
respectively; i.e., 
min {l, 1-7, +T,, 1-1,+1,, 1-F,+F,}, 
O(u, v)= (4) 


max {0, min{Q, —Q,,1,-1,,F, —F,}} 
Let us prove that the value of @(y, v) satisfies the conditions of the neutrosophic valued 


environment. In fact, from Eq. (4), we have 


min{l, 1-0,+0,, 1-1, +1,, 1-F,+F,}20 

max 0, min{Q, - ol, -1,,F,-F,}}20 (5) 

and since 

max {0, min{Q,-Q,,1,-1,,F,-F,}\ = a 
1-min{1, max {1-Q,+Q,,1-1,+1,,1-F,+F,}} 

and 

min {1, max {1-Q,+0,,1-1,+1,,1-F, + F,} 3 
>min{l, 1-0,+0,, 1-1,+1,, 1-F,+F,} 

then 

1-min{1, max {I-Q,+0,,1-1, +1,,1-F, +F,}| C 


+min{1,1-Q,+0,,1-1,+1,,1-F,+F,\<3 
This shows that the value of @(y, v) derived through Eq. (6) is a neutrosophic environment. 


Along with the neutrosophic Lukasiewicz implication, the square product, and the 
traditional triangle product, we introduce the neutrosophic triangle product and the 
neutrosophic square product as follows. 


3.7 Definitions of neutrosophic relations 

Neutrosophic relations are based on the conventional arithmetic, algebraic and geometric 
theories which are used in dealing various real time engineering problems. Neutrosophic 
relations also relate various neutrosophic sets. 
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Triangle product 
Let TaN aE eey (oe V = WeVinsiey, | , and 0 ={0,,0),...,0,| be three 
neutrosophic valued sets. S,<¢ N(yxv) and S, € N(vxq@) are two neutrosophic relations, 


and then, a neutrosophic triangle product, S,<S,¢eN (wu x v) of S, and S, , can be 


expressed as follows: 


1 q 
7 eau ( )>Xy(M, oa 


q 


1 
(S, 4 S5)(us v,) 7 got Oy )>%(O5 ¥4)? ve 
1 


q 2 Dv tu, Cy )P%y( v;) 


k=1 
q 
=l 


for any (4, v,)e(u, v),i=1,2,...,p, j =1,2,..47 , Where -» represents the 
neutrosophic Lukasiewicz implication. 

Square product 

Similarly, we define the neutrosophic square product, (S,oS,)<N(w x v) of S, and X,, 


as follows: 


min(S,(44, O J>5,(a, Vv; ),S2( a. v)>S\(4j, O )), 
5,oS. , V.)}=min| I 
( po? Nea i) 1<k<q min(S,(24, Oy \>So( v;),S2(@. v JS (ui, o)), (10) 
min(S,(24, O J>5,(ay, V;),So(@- Vv )J>S,(u, O )), 


for any (u;, v,)e(u, V) @=1,2j55, PJ Hl, Bash 


Denote X; as S ( jes Q,) for short, similar to the others, for convenience. Subsequently, we 


can simplify Eq. (9) and Eq. (10) as follows: 


Wo 

q A Sip PSgy ? 
1 <4 

(S, <S,)(u,, v,)= qe (11) 


1 y F 
q = Sip Sy 
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Q 


min( Siz Sy Sy Sx), 
(S\aS, Yu, a . ee Tin .5) 8-984) oe 


min( Si, Sy Sj Si ) 


Indeed, the neutrosophic triangle product and the neutrosophic square product are firmly 
related to each other. That is, the neutrosophic triangle product is the basis of the 
neutrosophic square product, and because of that, (5,05, ) ( ie v,) is directly derived from 


(S,<S,)(u,, v,)and (S, aS,)(u,, v,). 


4 Applications of the two neutrosophic products 


In this subsection, we use the neutrosophic triangle product to compare multi-attribute 
decision making with neutrosophic information. Subsequently, we use the neutrosophic 
square product for constructing an anneutrosophic similarity matrix. This anneutrosophic 
similarity matrix is used for analyzing the neutrosophic clustering method. 


Assume a multiple attribute decision making issue. Let W = {Wy Wy5.--5Wof and 
N = {N,M, setae sts define sets of p alternatives and gq attributes, respectively. The 
attribute values (also called a characteristic) of each alternative W; under all the attributes 


N A J =1,2....,m) represent the neutrosophic set. We make a decision based on the 


multiple attributes: 


w,=4(N), O,.(Ny)> fy (N,) Fy (MIM) EN} 1=12,.00 and f=1,2,.0m (13) 
where Q ( N,) denotes the degree of membership, 7, ( N,) denotes the degree of 
indeterminacy, and pf ( N,) denotes the degree of non-membership of W; to N, . 


Apparently, the degree of uncertainty of W, to N, is 
¥y, (N,) =3-0,, (N,)—L,, (AGF, (N,)- 
Let 5, =(0;, ds F,) =(0,,(N,); 1,(N,), F.,(N,)} be a neutrosophic value. An nxm 


neutrosophic decision matrix, § =(35) , can be constructed based on the neutrosophic 
nxm 


valued set s;, (i S125. el, Pega) 
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4.1 Neutrosophic triangle product’s application 
The characteristic vectors of two alternatives for the issues described above, say 5 and Sy 


can be expressed as S, = (Scenes Sin) and S;, a Seis eee) respectively. The 


neutrosophic triangle product can be calculated as follows: 


2 OL 


M KI 


ee (14) 


F : : -1 

This shows the degree of the alternative, wi)? for preferred alternative W; , where S| is 
. -l\) _ = 

the inverse of S, and can be defined as (5; 1 =(5), =Sizs Can tesy and 

Fe ; 


Similarly, we can calculate 


ey) S jk Six? 


M K=1 


(S, <l S;") = mae (15) 


Ly F Sik Sik 


M =I 


This shows that degree alternative W; is preferred to alternative W ; . The alternatives 


ordering W; and ,,_ can be obtained from Eqs. (14) and (15). In fact, 


a. if (5, < 5;') > (s, <5, a3 , alternative ,,_ is preferred to W, ; 

i j 
b. if (5, 45;'), =(5, aS, i , there is similarity between W; and W ; ; 
G if (5, 45;'), <(5, 45,") then W, is preferred to W , 


4.2 Neutrosophic square product’s application 


As we know from Eq. (10), mathematically, neutrosophic square product (S:xS2)ij can be 
deciphered as follows: (S:xS»2); measures the degree of similarities of the i” row of 
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neutrosophic matrix S, and the j” row of neutrosophic matrix s,. Therefore, considering 
the issue expressed at the start of Section 4, (S; x Sj-'); expresses the similarity of 


alternatives W; and W, . The following formula can be used for constructing a 


neutrosophic similarity matrix for w, = (i = near n). 


Q 


min( six. 9 jk 5,4) 2 
A = ‘ 
sim(w, w,) = (5,05; ) =min| J. 
i? J u J ij l<k<n min(six_,8 4 Sik si)? (16) 


min( 5.5. 95 jk Six ) 
Eq. (16) has the following desirable properties: 
1. sim (w pW; ) is the neutrosophic value. 
2. sim(w,, w,)= (1, 0) (F212 ct) 
3. sim(w,, w,)= sim(w,, w; ) (i = epee cae 


Proof for property 1 


We can prove that sim (w,, w,) is the neutrosophic value. 


Since the results s,, —> s,, and Sj, —* Sjx are all neutrosophic valued sets as 


Q 


: 2. 
min(six_,5 0S jk sit ) 
. 9 - 7 
proven previously, then min( si. 58498 six) is the neutrosophic value for any ; . 


min( six, 5 ji PSig ) 
Proof for property 2 


Since 


Cites. Sik Six)? 
é -] 2 
SIM (v, » W; ) > (5,05, J. = eae adh Bice 5)? 
min (5,45 5j¢ Sie >it ) 


we know from definition (10) that sim (w,, w,)=(I, 0). 


Proof for property 3 
Since 
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0 


: 9 
min( si, 5p 5 je Sik ) 


sim(w, w,)=(S,05;') =min| I 


i 7 by 
ij l<k<n min( si, 5 jp Sik Six ) 


min( sg. 5 jk PSik ) 


Q 


F 2 
min(s Sik Sik jk ) 


= min 


: 9 
l<k<m min(s Siz Sin+5 jt) 


min(s jx Sik Sik jk ) 


=(X ,0X,)=sim(w,, w, ) 


then, sim(w, w,)=sim(w,, w,) (if 1, 2st) 


At that point, from the above analyses, we can determine that Eq. (16) satisfies the 
neutrosophic similarity relation conditions. Thus, this can be used to construct a 
neutrosophic similarity matrix. 


5 Direct neutrosophic cluster analysis method 


After constructing a neutrosophic similarity matrix with the abovementioned method, the 
equivalent matrix is not required before cluster analysis. The required cluster analysis 
results can be obtained with the neutrosophic equivalent matrix, starting with the 
neutrosophic similarity matrix. In fact, Luo [Luo (1989)] proposed a direct method for 
clustering fuzzy sets. This method considers only membership degrees of fuzzy sets. Our 
proposed direct neutrosophic cluster analysis technique considers the enrollment degrees, 
indeterminacy degrees, and non-participation degrees of the neutrosophic esteemed set 
under the neutrosophic conditions presented below. The proposed method is based on Luo’s 
method, which includes following stages. 

Stage A. Let S= (s,) be the neutrosophic similarity matrix, where 


Si =(0;; Tes F,) (i, = 125072) is a neutrosophic valued set for determining the 
confidence level, A, Select one of the elements, §, which obeys the following principles. 


a. Rank the degrees of membership of 5, (i, j= 1,2,...,7) in descending order. Take 
A, a (Q, s I, Zi F, ) = (Q,, 2 Lia Z Fi ). where 0: = is (0, 
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b. If there exist two neutrosophic valued sets, (Q,, fe eee a ) and (0, es dig ul 


ah, 
in (1), such that /;, #/i,;, and F,, #F'%i;, (without loss of generality, let 
[,, <fi, and F,,< Fi; ), then we choose the first one as h me ey 


A, =(Dis Tije F,,,). 


Then, for each alternative W ,, let 
(1) es 
[wi] = {", s, = 1, (17) 


(1) 
Here, W; and all alternatives in [w,], are clustered into one category, and other 


alternatives are clustered into another category. 


Stage B. Select the confidence level, 1, = (Ops Lig Ey ) =(0,,,. eee oe ), with 

Q, , = max (9) , specifically if there exist at least two neutrosophic esteemed sets 
LIAS 

where the membership degrees have the same value as o , At that point, we can follow 


strategy (b) in Stage A. Now, let alternatives [», i be tw, s a A} , and then, W ; and all 


(1) (2) (1,2) 
the alternatives are clustered into one type. Let the merger of [w, 5 and [w, g be [w, 5 


(12) 
Then, the merged alternatives [w, I = {w, ls, € {4,A}t, and therefore, W; and all 


(1,2) 
alternatives in [w, gs are clustered into one set. The other alternatives remain unaltered. 


Stage C. In this stage, we take other confidence levels and analyze clusters according to 
the procedure in Stage B. The procedure is carried out until all alternatives are clustered 
into one category. One of the significant advantages of the proposed direct neutrosophic 
cluster analysis method is that cluster analysis can be acknowledged by simply depending 
on the subscripts of the alternatives. We observed from the process described above that, 
in this method, getting even an A-cutting matrix is not necessary. 

In real-world application scenarios, we simply need to affirm their areas in the neutrosophic 
similarity matrix after choosing some appropriate confidence levels, and afterward, we can 
get the kinds of considered objects on the basis of their area subscripts. 


6 Performance evaluation 


For the performance evaluation, a k-means algorithm and a threshold-based algorithm were 
used on the Iris dataset from the University of California, Irvine (UCI) Machine Learning 
Repository. A variable number of clusters (from 2 to 10) were generated for the 
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experiments. For the k-means and threshold-based algorithms, cluster number is the input 
parameter. The & data objects were selected randomly in the k-means algorithm (A was also 
taken as an initial centroid of the clusters). On the other hand, only one object was selected 
randomly in the threshold-based method. The selected object was assigned as the initial 
centroid of the cluster, and was a member of the first cluster. We observed that this method 
generates more segregated and compact clusters. Finally, we observed that there was 
significant enhancement in the indices of validity. The following mathematical analysis 
proves the above statements. 

For any cluster-based intuitionistic neutrosophic implication, let X(Ti, Fa) — Y(Tj, Fo), 
where T and F depict truthfulness and falsehood. 

Then, we can define various classes of cluster-based neutrosophic set (CNSS) implications, 
as expressed below: 

CNSS=([U-T) fv T] F ALC — fof V/X J, (Vp A I-Ty) (18) 
The proposed new cluster-based intuitionistic neutrosophic (CIN) implication is now 
extended with X(T;, za, 7X) V— Y/(T;, ZY, /Y), as follows: 

CIMI (T;f//f — T;, Xf YY A, SXf SY A) 

where 7; f//f — Tj is any cluster of intuitionistic neutrosophic implications, while fis any 
A neutrosophic conjunction: 

CIN2 (T; f//f > Tj, Xf YY Ve (Xf fY A), where fis any V fuzzy disjunction: 

CIN3 (T; f /If > Tj, ix+iY 2, fXf FY A) 

CIN4 (T; f/f — T;, ixt+iY 2, fX+fY 2) 

Referring to the definition proposed by Broumi et al. [Broumi, Smarandache and Dhar 
(2014)], the classical logical equivalence and predicate relationship now becomes 

(X — Y) ~ (“XV Y), where, (X N > Y) No (NX-“NYV) 

The above class of neutrosophic implications can now be depicted with the operators (VX 
— NY V ). Let us have two cluster-based neutrosophic propositions: X(0.3, 0.4, 0.2) and 
Y(0.7, 0.1, 0.4). 

Then, X N — Y has the neutrosophic truth value of X YN V N —, i.e., (0.2, 0.4, 0.3) ( N 
0.7, 0.1, 0.4) V , or (max {0.2, 0.7}, min{0.4, 0.1}, min{0.3, 0.43), or (0.7, 0.1, 0.3). 
Therefore, 

N«t, i, f) = (f, i, t) > for neutrosophic negation 

and 

(ti, ti, fi ) (to, i, fo N ) V = (max {t1, to}, min{is, i2 }, min{fi, fo }) for the neutrosophic 
disjunction. 

The dataset that we referred to from Stappers et al. [Stappers, Cooper, Brooke et al. (2016)] 
and [Systems (2020)] contains 16,259 spurious examples caused by radio frequency 
interference (RFI)/noise, and 1,639 real pulsar examples with each candidate having eight 


continuous variables. The first four variables are obtained from the integrated pulse profile. 
This is an array of continuous variables that describe a longitude-resolved version of the 
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signal. The remaining four variables were similarly obtained from the dispersion measure 
(DM)-SNR curve. These are summarized in Tab. 2. 


Tab. 2 shows a dataset describing a sample of pulsar candidates collected during the high 
time-resolution universe survey. The first column is the mean of the integrated profile. 
Mean! is the mean of the DM-SNR curve, and SD1 is the standard deviation of the DM- 
SNR curve. Finally, ET1 is the excess kurtosis of the DM-SNR curve, and Skewness1 is 
the skewness of the DM-SNR curve. 


Table 2: Pulsar candidate samples collected during the high time—resolution universe survey 


Mean SD ET Skewness Meanl SD1 ET1 Skewness1 T/F 
“140.5625. 55.68378 ——--0.23457 —--0.69965 3.199833 :19.11043 7.975532 74.24222 
102.5078 58.88243 0.465318 = -0.51509 .677258 4.86015 0.57649 127.3936 
103.0156 39.34165 0.323328 1.051164 3.121237 = 21.74467 ~—- 7.735822 63.17191 
136.7500 57.17845 -0.06841 -0.63624 3.642977 = 20.95928 6.896499 53.59366 


88.72656 40.67223 0.600866 =: 1.123492 17893 1.46872 4.26957 252.5673 
93.57031 46.69811 0.531905 0.416721 636288 4.54507 0.62175 131.3940 
119.4844 48.76506 0.03146 -0.11217 0.999164 9.279612 9.20623 479.7566 
130.3828 39.84406 -0.15832 0.38954 .220736 4.37894 3.53946 198.2365 


107.2500 52.62708 0.452688 0.170347 2.33194 4.48685 9.001004 107.9725 
107.2578 39.49649 0.465882 1.162877 4.079431  24.98042 7.397080 57.78474 
142.0781 45.28807 -0.32033 0.283953 5.376254 29.0099 6.076266 37.83139 
133.2578 44.05824 -0.08106 0.115362 1.632107 2.00781 = 11.97207 195.5434 


134.9609 49.55433 -0.1353 -0.08047 10.69649 = 41.34204 = 3.893934 14.13121 
117.9453 45.50658 0.325438 0.661459 2.83612 23.11835 8.943212 82.47559 
138.1797 51.52448 -0.03185 0.046797 6.330268 = 31.57635 5.155940 26.14331 
114.3672 51.94572 -0.0945 -0.28798 2.738294  17.19189 9.050612 96.6119 
109.6406 49.01765 0.137636  -0.25670 1.508361 12.0729 13.36793 223.4384 
100.8516 51.74352 0.393837 = -0.01124 2.841137 = 21.63578 8.302242 71.58437 
136.0938 51.691 -0.04591 -0.27182 9.342809 38.0964 4.345438 18.67365 
99.36719 41.5722 1.547197 4.154106 27.55518 61.71902 = 2.208808 3.66268 
100.8906 51.89039 0.627487  -0.02650 3.883779 = 23.04527 = 6.953168 52.27944 
105.4453 41.13997 0.142654 0.320420 3.551839 = 20.75502 = 7.739552 68.51977 


ooorococlcomccOCCcOCUCcOCcCOcUCOCUCcOCCOCUCcCOCUCcCOCUCcCOCUCcOCCOCOCcocCcUcCcCcCUCcUcCOCULcCOlLcO 


95.86719 42.05992 0.326387 0.803502 1.832776  12.24897 —-11.24933 177.2308 


In Tab. 2, the mean of the integrated profile is compared with pulsar candidates that vary 
significantly with Meanl. Here, Mean! is the mean of popular candidates at high time 
resolution. The dataset that we have referred from Stappers et al. [Stappers, Cooper, Brooke 
et al. (2016)] and [Systems (2020)] that contains 16,259 spurious examples caused by radio- 
frequency interference (RFI) or noise, and 1,639 real pulsar examples with each candidate 
having 8 continuous variables. The first four variables are obtained from the integrated 
pulse profile. This is an array of continuous variables that describe a longitude-resolved 
version of the signal. The remaining 4 variables are similarly obtained from the dispersion 
measure (DM)-SNR curve. These are summarized in Tab. 2. 
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7 Conclusions and future work 


One of the major issues in data clustering is the selection of the right candidates. In addition, 
the appropriate algorithm to choose the right candidates has been a challenging issue in 
cluster analysis, especially for an efficient approach that best fits the right sets of data. In 
this paper, a cluster analysis method based on neutrosophic set implication generates the 
clusters automatically and overcomes the limitation of the k-means algorithm. Our 
proposed method generates more segregated and compact clusters and achieves higher 
validity indices, in comparison to the mentioned algorithms. The experimentation carried 
out in this work focused on cluster analysis based on NSI through a k-means algorithm 
along with a threshold-based clustering technique. We found that the proposed algorithm 
eliminates the limitations of the threshold-based clustering algorithm. The validity 
measures and respective indices applied to the Iris dataset along with k-means and 
threshold-based clustering algorithms prove the effectiveness of our method. 

Future work will handle data clustering in various dynamic domains using neutrosophic 
theory. We also intend to apply a periodic search routine by using propagations between 
datasets of various domains. The data clustering used by our proposed algorithms was 
found to be workable in a low computational configuration. In the future, we will also use 
more datasets. 
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